High Performance Dense Linear Systems Solution on a Beowulf Cluster
نویسنده
چکیده
In this paper, we describe techniques which can improve the performance of dense linear system solution, based on LU, LLT and QR factorizations, on distributed memory multiprocessors, including cluster computers. The most important of these are refinements of the algorithmic blocking technique which reduce the bulk of its introduced communication startup costs and make the technique superior to storage blocking in terms of communication volume costs. These primarily rely on pipelined communication and the choice of a small storage block size. Two other techniques, optimizing the memory behavior in multiple row swaps, and the coalescing of vector-matrix multiplies in QR, also afford modest improvements in storage blocking and serial performance. Performance results on a 24 node Beowulf cluster with 550 MHz dual SMP Pentium III nodes connected by a COTS switch with 10 Mb/s links, show that algorithmic blocking generally improves performance by 15–30% or more for these computations over a large range of system sizes.
منابع مشابه
Parallel Algorithms for LQ Optimal Control of Discrete-Time Periodic Linear Systems
This paper analyzes the performance of two parallel algorithms for solving the linear-quadratic optimal control problem arising in discrete-time periodic linear systems. The algorithms perform a sequence of orthogonal reordering transformations on formal matrix products associated with the periodic linear system, and then employs the so-called matrix disk function to solve the resulting discret...
متن کاملThe Scyld Beowulf System
Beowulf systems are high performance computers constructed from commodity hardware connected by a private internal network and running an open source operating system infrastructure. This tutorial will cover design, installation, deployment and running Beowulf clusters. It will be focused on the Scyld Beowulf system, and will include specific examples and a complete cluster installation CD-ROM.
متن کاملA Power Efficient Linear Equation Solver on a Multi-fpga Accelerator
This paper presents an approach to explore a commercial multi-FPGA system as high performance accelerator and the problem of solving a LU decomposed linear system of equations using forward and back substitution is addressed. Block-based Right-Hand-Side solver algorithm is described and a novel data flow and memory architectures that can support arbitrary data types, block sizes, and matrix siz...
متن کاملDevelopment of Beowulf Cluster to Perform Large Datasets Simulations in Educational Institutions
This paper presents the design and development of the Beowulf cluster that can be used by institutions to perform research that requires high performance computing. In many industries and scientific applications there is often a need to analyse large datasets using computational power of distributed and parallel systems. The High Performance Computing (HPC) components are very expensive but wit...
متن کاملEfficient Parallel Solvers for Large Dense Systems of Linear Interval Equations
Verified solvers for dense linear (interval-)systems require a lot of resources, both in terms of computing power and memory usage. Computing a verified solution of large dense linear systems (dimension n > 10000) on a single machine quickly approaches the limits of today’s hardware. Therefore, an efficient parallel verified solver for distributed memory systems is needed. In this work we prese...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001